Hands-on Exercise 3

Programming Interactive Data Visualisation and Animated Statistical Graphics with R

Published

January 23, 2024

Modified

February 7, 2024

Summary

This hands-on exercise consist of two main topic, namely:

  1. Programming Interactive Data Visualisation with R
  2. Programming Animated Statistical Graphics with R

1 Programming Interactive Data Visualisation with R

1.1 Loading R packages

pacman::p_load(ggiraph, plotly,
               patchwork, DT, tidyverse)

R packages for Interactive Data

  • ggiraph :making ‘ggplot’ graphics interactive.
  • plotly : R library for plotting interactive statistical graphs.
  • DT : provides an R interface to the JavaScript library DataTables that create interactive table on html page.
  • tidyverse : a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.
  • patchwork : combining multiple ggplot2 graphs into one figure.

1.2 Importing the Data

exam_data <- read_csv("data/Exam_data.csv")

1.3 Overview of the data

exam_data
# A tibble: 322 × 7
   ID         CLASS GENDER RACE    ENGLISH MATHS SCIENCE
   <chr>      <chr> <chr>  <chr>     <dbl> <dbl>   <dbl>
 1 Student321 3I    Male   Malay        21     9      15
 2 Student305 3I    Female Malay        24    22      16
 3 Student289 3H    Male   Chinese      26    16      16
 4 Student227 3F    Male   Chinese      27    77      31
 5 Student318 3I    Male   Malay        27    11      25
 6 Student306 3I    Female Malay        31    16      16
 7 Student313 3I    Male   Chinese      31    21      25
 8 Student316 3I    Male   Malay        31    18      27
 9 Student312 3I    Male   Malay        33    19      15
10 Student297 3H    Male   Indian       34    49      37
# ℹ 312 more rows

1.4 Interactive Data Visualisation - ggiraph methods

ggiraph makes ‘ggplot’ graphics interactive with these arguments.

  • Tooltip : tooltips to be displayed when mouse is over elements.
  • Data_id: id to be associated with elements (used for hover and click actions).
  • Onclick: JavaScript function to be executed when elements are clicked.

1.4.1 Tooltip effect with tooltip aesthetic

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(                          # Create basic graph
    aes(tooltip = ID),                               # specify tooltip here
    stackgroups = TRUE,                             
    binwidth = 1, 
    method = "histodot") +
  scale_y_continuous(NULL, 
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,                                 # generate svg object on an html page.
       width_svg = 6,
       height_svg = 6*0.618)
Tip

Hover over a dot to check out the student’s ID

Displaying multiple information on tooltip

# create a new field called tooltip with desired data 
exam_data$tooltip <- c(paste0("Name = ", exam_data$ID,         
                              "\n Class = ", exam_data$CLASS,
                              "\n Gender = ", exam_data$GENDER)) 

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = exam_data$tooltip),      # newly created field used as tooltip field 
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,
       width_svg = 8,
       height_svg = 8*0.618)
Tip

Hover over a dot. Now, more information is shown!

Customising Tooltip style

Code chunk below uses opts_tooltip() of ggiraph to customize tooltip rendering by add css declarations.

tooltip_css <- 
"background-color:grey;                     
font-style:bold; 
color:black;
font-size: 1.2em"                     # customise tooltip css

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(tooltip = ID),                   
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +               
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,                             
       width_svg = 6,                         
       height_svg = 6*0.618,
       options = list(opts_tooltip(css = tooltip_css)))  # add the tooltip_css here                                      
Note

background colour of the tooltip is grey and the font colour is white and bold.

Displaying statistics on tooltip

Using stat_summary(), a function is used to compute 90% confident interval of the mean. The derived statistics are then displayed in the tooltip.

tooltip <- function(y, ymax, accuracy = .01) 
  {mean <- scales::number(y, accuracy = accuracy)
  sem <- scales::number(ymax - y, accuracy = accuracy)
  paste("Mean maths scores:", mean, "+/-", sem)}

gg_point <- ggplot(data=exam_data, 
                   aes(x = RACE),) +
  
  stat_summary(aes(y = MATHS, 
                   tooltip = after_stat(tooltip(y, ymax))),        # adding tool tip
    fun.data = "mean_se", 
    geom = GeomInteractiveCol,  
    fill = "light blue") +
  
  stat_summary(aes(y = MATHS),                        # adding error bar
    fun.data = mean_se,
    geom = "errorbar", width = 0.1, size = 0.2) +
  theme_minimal()

girafe(ggobj = gg_point,
       width_svg = 8,
       height_svg = 8*0.618)

1.4.2 Hover effect with data_id

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  
  geom_dotplot_interactive(           
    aes(data_id = CLASS),                        # specify data_id here       
    stackgroups = TRUE,               
    binwidth = 1,                        
    method = "histodot") +
  
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,                             
       width_svg = 6,                         
       height_svg = 6*0.618)                                             
Note

Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. The default color is orange.

Styling hover effect

Customize highlighting effect

  • using opts_hover() for effect on geometries
  • using opts_hover_inv for effect on other geometries
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  
  geom_dotplot_interactive(           
    aes(data_id = CLASS),                        # specify data_id here       
    stackgroups = TRUE,               
    binwidth = 1,                        
    method = "histodot") +
  
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,                             
       width_svg = 6,                         
       height_svg = 6*0.618,
       options = list(opts_hover(css = "fill: blue;"),          # effect on geometries
                      opts_hover_inv(css = "opacity:0.2;")))    # effect on other geometries
Note

Different from previous example, in this example the ccs customisation request are encoded directly.

1.4.3 Combining tooltip and hover effect

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  
  geom_dotplot_interactive(           
    aes(tooltip = CLASS,                          # specify tooltip here  
        data_id =  CLASS),                        # specify data_id here       
    stackgroups = TRUE,               
    binwidth = 1,                        
    method = "histodot") +
  
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,                             
       width_svg = 6,                         
       height_svg = 6*0.618,
       options = list(opts_hover(css = "fill: blue;"),          # effect on geometries
                      opts_hover_inv(css = "opacity:0.2;")))    # effect on other geometries
Interactivity

Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. At the same time, the tooltip will show multiple information.

1.4.4 Coordinated Multiple Views

# create a new field called tooltip with desired data 
exam_data$tooltip <- c(paste0("Name = ", exam_data$ID,         
                              "\n Class = ", exam_data$CLASS,
                              "\n Gender = ", exam_data$GENDER)) 

p1 <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  
  geom_dotplot_interactive(              
    aes(tooltip = exam_data$tooltip,
        data_id = ID),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +  
  coord_cartesian(xlim=c(0,100)) + 
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

p2 <- ggplot(data=exam_data, 
       aes(x = ENGLISH)) +
  
  geom_dotplot_interactive(              
    aes(tooltip = CLASS,
        data_id = ID),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") + 
  coord_cartesian(xlim=c(0,100)) + 
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(code = print(p1 / p2), 
       width_svg = 6,
       height_svg = 6,
       options = list(
         opts_hover(css = "fill: #202020;"),
         opts_hover_inv(css = "opacity:0.2;"))) 
Interactivity
  • Notice that when a data point of one of the dotplot is selected, the corresponding data point ID on the second data visualisation will be highlighted too.

  • The data_id aesthetic is critical to link observations between plots and the tooltip aesthetic is optional but nice to have when mouse over a point.

1.4.5 Click effect with onclick

onclick argument of ggiraph provides hotlink interactivity on the web.

exam_data$onclick <- sprintf("window.open(\"%s%s\")",
"https://www.moe.gov.sg/schoolfinder?journey=Primary%20school",
as.character(exam_data$ID))

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  
  geom_dotplot_interactive(              
    aes(onclick = onclick),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +  
  
  scale_y_continuous(NULL,               
                     breaks = NULL) +
  theme_minimal()

girafe(ggobj = p,                             
       width_svg = 6,                         
       height_svg = 6*0.618)   
Interactivity

Web document link with a data object will be displayed on the web browser upon mouse click.

1.5 Interactive Data Visualisation - plotly methods

There are two ways to create interactive graph by using plotly, they are:

1.5.1 Using plot_ly()

plot_ly(data = exam_data, 
             x = ~MATHS, 
             y = ~ENGLISH)
plot_ly(data = exam_data, 
        x = ~ENGLISH, 
        y = ~MATHS, 
        color = ~RACE)

1.5.2 Using ggplotly()

  • Appropriate ggplot2 functions are used to create a scatter plot.
  • ggplotly() is used to convert the R graphic object into interactive object.
p <- ggplot(data=exam_data, 
            aes(x = MATHS,
                y = ENGLISH)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))
ggplotly(p)                                   # add this line

1.5.3 Coordinated Multiple Views with ggplotly()

Three steps for creating coordinated linked plot:

  1. highlight_key() of plotly package is used as shared data.
  2. two scatterplots will be created by using ggplot2 functions.
  3. subplot() of plotly package is used to place them next to each other side-by-side.
d <- highlight_key(exam_data)                        # Step 1 

p1 <- ggplot(data=d,                                 # Step 2
            aes(x = MATHS,
                y = ENGLISH)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100)) 

p2 <- ggplot(data=d, 
            aes(x = MATHS,
                y = SCIENCE)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))

subplot(ggplotly(p1),                              # Step 3
        ggplotly(p2))

1.6 Interactive Data Visualisation - crosstalk methods

  • Crosstalk is an add-on to the htmlwidgets package.
  • It extends htmlwidgets with a set of classes, functions, and conventions for implementing cross-widget interactions (currently, linked brushing and filtering).

1.6.1 Interactive Data Table: DT package

DT package allow rendering of data objects as HTML tables.

DT::datatable(exam_data[c("ID","CLASS","GENDER","RACE","ENGLISH","MATHS","SCIENCE")], class= "compact")

1.6.2 Linked brushing: crosstalk method

d <- highlight_key(exam_data) 

p <- ggplot(d, 
            aes(ENGLISH, 
                MATHS)) + 
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))

gg <- highlight(ggplotly(p),        
                "plotly_selected")  

crosstalk::bscols(gg,               
                  DT::datatable(d), 
                  widths = 5)        

Things to learn from the code chunk:

highlight() is a function of plotly package. It sets a variety of options for brushing (i.e., highlighting) multiple plots. These options are primarily designed for linking multiple plotly graphs, and may not behave as expected when linking plotly to another htmlwidget package via crosstalk. In some cases, other htmlwidgets will respect these options, such as persistent selection in leaflet.

bscols() is a helper function of crosstalk package. It makes it easy to put HTML elements side by side. It can be called directly from the console but is especially designed to work in an R Markdown document. Warning: This will bring in all of Bootstrap!.

2 Programming Animated Statistical Graphics with R

2.1 Loading R packages

pacman::p_load(readxl, gifski, gapminder,
               plotly, gganimate, tidyverse)

R packages for Animated plot

  • plotly : plotting interactive statistical graphs.
  • gganimate : creating animated statistical graphs.
  • gifski : converts video frames to GIF animations using pngquant’s fancy features for efficient cross-frame palettes and temporal dithering. It produces animated GIFs that use thousands of colors per frame.
  • gapminder: An excerpt of the data available at Gapminder.org. We just want to use its country_colors scheme.
  • tidyverse : a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.

2.2 Importing the Data

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  
# change "Country" and "Continent" (aka col) as factor 
  mutate(across(col, as.factor)) %>%  
  
# change "Year" as integer   
  mutate(Year = as.integer(Year))          

Alternatively, use mutate_all()

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  
# change "Country" and "Continent" (aka col) as factor 
  mutate_at(col, as.factor) %>%  
  
# change "Year" as integer   
  mutate(Year = as.integer(Year))          

Things to learn from the code chunk above

  • read_xls() of readxl package is used to import the Excel worksheet.
  • mutate() of dplyr package is used to create new columns or modify columns that are functions of existing variables.
  • across() apply the same functions to multiple columns
  • mutate_at() convert all character data type columns into factor.

2.3 Overview of the data

globalPop
# A tibble: 6,204 × 6
   Country      Year Young   Old Population Continent
   <fct>       <int> <dbl> <dbl>      <dbl> <fct>    
 1 Afghanistan  1996  83.6   4.5     21560. Asia     
 2 Afghanistan  1998  84.1   4.5     22913. Asia     
 3 Afghanistan  2000  84.6   4.5     23898. Asia     
 4 Afghanistan  2002  85.1   4.5     25268. Asia     
 5 Afghanistan  2004  84.5   4.5     28514. Asia     
 6 Afghanistan  2006  84.3   4.6     31057  Asia     
 7 Afghanistan  2008  84.1   4.6     32738. Asia     
 8 Afghanistan  2010  83.7   4.6     34505. Asia     
 9 Afghanistan  2012  82.9   4.6     36416. Asia     
10 Afghanistan  2014  82.1   4.7     38327. Asia     
# ℹ 6,194 more rows

2.4 Animated Data Visualisation: gganimate methods

gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time.

Sample Syntax

  • transition_*() defines how the data should be spread out and how it relates to itself across time.
  • view_*() defines how the positional scales should change along the animation.
  • shadow_*() defines how data from other points in time should be presented in the given point in time.
  • enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.
  • ease_aes() defines how different aesthetics should be eased during transitions.
ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population,     # the size of dot depends on population 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') 

ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') +
  transition_time(Year) +              # add this line
  ease_aes('linear')                   # and this line

For animated plot:

  • transition_time() of gganimate is used to create transition through distinct states in time (i.e. Year).
  • ease_aes() is used to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

2.5 Animated Data Visualisation: plotly

2.5.1 Using plot_ly()

bp <- globalPop %>%
  
  plot_ly(x = ~Old, 
          y = ~Young, 
          size = ~Population, 
          color = ~Continent,
          sizes = c(2, 100),
          frame = ~Year, 
          text = ~Country, 
          hoverinfo = "text",
          type = 'scatter',
          mode = 'markers'
          ) %>%
  layout(showlegend = FALSE)

bp

2.5.2 Using ggplotly()

  • Appropriate ggplot2 functions are used to create a static bubble plot.
  • The output is then saved as an R object called gg.
  • ggplotly() is used to convert the R graphic object into an animated svg object.
gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7, 
             show.legend = FALSE) +                       # this doesn't work
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young')

ggplotly(gg)
gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') + 
  theme(legend.position='none')                       # use this instead

ggplotly(gg)

Things to learn from the code chunk above

  • although show.legend = FALSE argument was used, the legend still appears on the plot.
  • To overcome this problem, theme(legend.position='none') should be used.

3 Reference

Back to top